HEAD
## [1] 1599 13
## [1] "X" "fixed.acidity" "volatile.acidity"
## [4] "citric.acid" "residual.sugar" "chlorides"
## [7] "free.sulfur.dioxide" "total.sulfur.dioxide" "density"
## [10] "pH" "sulphates" "alcohol"
## [13] "quality"
## Warning in data(wines): data set 'wines' not found
## 'data.frame': 1599 obs. of 13 variables:
## $ X : int 1 2 3 4 5 6 7 8 9 10 ...
## $ fixed.acidity : num 7.4 7.8 7.8 11.2 7.4 7.4 7.9 7.3 7.8 7.5 ...
## $ volatile.acidity : num 0.7 0.88 0.76 0.28 0.7 0.66 0.6 0.65 0.58 0.5 ...
## $ citric.acid : num 0 0 0.04 0.56 0 0 0.06 0 0.02 0.36 ...
## $ residual.sugar : num 1.9 2.6 2.3 1.9 1.9 1.8 1.6 1.2 2 6.1 ...
## $ chlorides : num 0.076 0.098 0.092 0.075 0.076 0.075 0.069 0.065 0.073 0.071 ...
## $ free.sulfur.dioxide : num 11 25 15 17 11 13 15 15 9 17 ...
## $ total.sulfur.dioxide: num 34 67 54 60 34 40 59 21 18 102 ...
## $ density : num 0.998 0.997 0.997 0.998 0.998 ...
## $ pH : num 3.51 3.2 3.26 3.16 3.51 3.51 3.3 3.39 3.36 3.35 ...
## $ sulphates : num 0.56 0.68 0.65 0.58 0.56 0.56 0.46 0.47 0.57 0.8 ...
## $ alcohol : num 9.4 9.8 9.8 9.8 9.4 9.4 9.4 10 9.5 10.5 ...
## $ quality : int 5 5 5 6 5 5 5 7 7 5 ...
## NULL
## [1] 7.4 7.8 11.2 7.9 7.3 7.5 6.7 5.6 8.9 8.5 8.1 7.6 6.9 6.3
## [15] 7.1 8.3 5.2 5.7 8.8 6.8 4.6 7.7 8.7 6.4 6.6 8.6 10.2 7.0
## [29] 7.2 9.3 8.0 9.7 6.2 5.0 4.7 8.4 10.1 9.4 9.0 8.2 6.1 5.8
## [43] 9.2 11.5 5.4 9.6 12.8 11.0 11.6 12.0 15.0 10.8 11.1 10.0 12.5 11.8
## [57] 10.9 10.3 11.4 9.9 10.4 13.3 10.6 9.8 13.4 10.7 11.9 12.4 12.2 13.8
## [71] 9.1 13.5 10.5 12.6 14.0 13.7 9.5 12.7 12.3 15.6 5.3 11.3 13.0 6.5
## [85] 12.9 14.3 15.5 11.7 13.2 15.9 12.1 5.1 4.9 5.9 6.0 5.5
## [1] 5 6 7 4 8 3
## [1] 1.90 2.60 2.30 1.80 1.60 1.20 2.00 6.10 3.80 3.90 1.70
## [12] 4.40 2.40 1.40 2.50 10.70 5.50 2.10 1.50 5.90 2.80 2.20
## [23] 3.00 3.40 5.10 4.65 1.30 7.30 7.20 2.90 2.70 5.60 3.10
## [34] 3.20 3.30 3.60 4.00 7.00 6.40 3.50 11.00 3.65 4.50 4.80
## [45] 2.95 5.80 6.20 4.20 7.90 3.70 6.70 6.60 2.15 5.20 2.55
## [56] 15.50 4.10 8.30 6.55 4.60 4.30 5.15 6.30 6.00 8.60 7.50
## [67] 2.25 4.25 2.85 3.45 2.35 2.65 9.00 8.80 5.00 1.65 2.05
## [78] 0.90 8.90 8.10 4.70 1.75 7.80 12.90 13.40 5.40 15.40 3.75
## [89] 13.80 5.70 13.90
## X fixed.acidity volatile.acidity citric.acid
## Min. : 1.0 Min. : 4.60 Min. :0.1200 Min. :0.000
## 1st Qu.: 400.5 1st Qu.: 7.10 1st Qu.:0.3900 1st Qu.:0.090
## Median : 800.0 Median : 7.90 Median :0.5200 Median :0.260
## Mean : 800.0 Mean : 8.32 Mean :0.5278 Mean :0.271
## 3rd Qu.:1199.5 3rd Qu.: 9.20 3rd Qu.:0.6400 3rd Qu.:0.420
## Max. :1599.0 Max. :15.90 Max. :1.5800 Max. :1.000
## residual.sugar chlorides free.sulfur.dioxide
## Min. : 0.900 Min. :0.01200 Min. : 1.00
## 1st Qu.: 1.900 1st Qu.:0.07000 1st Qu.: 7.00
## Median : 2.200 Median :0.07900 Median :14.00
## Mean : 2.539 Mean :0.08747 Mean :15.87
## 3rd Qu.: 2.600 3rd Qu.:0.09000 3rd Qu.:21.00
## Max. :15.500 Max. :0.61100 Max. :72.00
## total.sulfur.dioxide density pH sulphates
## Min. : 6.00 Min. :0.9901 Min. :2.740 Min. :0.3300
## 1st Qu.: 22.00 1st Qu.:0.9956 1st Qu.:3.210 1st Qu.:0.5500
## Median : 38.00 Median :0.9968 Median :3.310 Median :0.6200
## Mean : 46.47 Mean :0.9967 Mean :3.311 Mean :0.6581
## 3rd Qu.: 62.00 3rd Qu.:0.9978 3rd Qu.:3.400 3rd Qu.:0.7300
## Max. :289.00 Max. :1.0037 Max. :4.010 Max. :2.0000
## alcohol quality
## Min. : 8.40 Min. :3.000
## 1st Qu.: 9.50 1st Qu.:5.000
## Median :10.20 Median :6.000
## Mean :10.42 Mean :5.636
## 3rd Qu.:11.10 3rd Qu.:6.000
## Max. :14.90 Max. :8.000
As for quality, most wines have been evaluated above average as median is bigger than mean. for most variables median is below mean, most notabaly for total.sulfur.dioxide where if above 50ppm the smell and taste becomes evident, the median is substantially bellow the mean, still 25% of wines have over 62ppm. with most attributes except density, PH, and to some extend alcohol, the varaition within the four quartiles is wide, specially between the min and the max which can be because of outliers.
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 3.000 5.000 6.000 5.636 6.000 8.000
##
## 3 4 5 6 7 8
## 10 53 681 638 199 18
How does the distribution of total.sulfur.dioxide differ for different qualities? according to the description of the data set there might be a relationship between the two. I wonder how other variables will affect the quality. The table shows the number of wines with different wine qualities.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## Warning: Stacking not well defined when ymin != 0
Let’s see which alcohol degree is the most common.
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 8.40 9.50 10.20 10.42 11.10 14.90
##
## 8.4 8.5 8.7 8.8
## 2 1 2 2
## 9 9.05 9.1 9.2
## 30 1 23 72
## 9.23333333333333 9.25 9.3 9.4
## 1 1 59 103
## 9.5 9.55 9.56666666666667 9.6
## 139 2 1 59
## 9.7 9.8 9.9 9.95
## 54 78 49 1
## 10 10.0333333333333 10.1 10.2
## 67 2 47 46
## 10.3 10.4 10.5 10.55
## 33 41 67 2
## 10.6 10.7 10.75 10.8
## 28 27 1 42
## 10.9 11 11.0666666666667 11.1
## 49 59 1 27
## 11.2 11.3 11.4 11.5
## 36 32 32 30
## 11.6 11.7 11.8 11.9
## 15 23 29 20
## 11.95 12 12.1 12.2
## 1 21 13 12
## 12.3 12.4 12.5 12.6
## 12 13 21 6
## 12.7 12.8 12.9 13
## 9 17 9 6
## 13.1 13.2 13.3 13.4
## 2 1 3 3
## 13.5 13.5666666666667 13.6 14
## 1 1 4 7
## 14.9
## 1
A large number of wines fall between 9 and 10 degrees of alcohol. The median is 10.2. I am including the table for wine alcohol, because alcohol has the strongest corrolation with quality and we can see the number of wines with a given amount of alcohol. The largest number is the wines with 10.5.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 4.60 7.10 7.90 8.32 9.20 15.90
For fixed acidity, median is 7.90 and mean is lower because of outliers
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.1200 0.3900 0.5200 0.5278 0.6400 1.5800
I will create a new variable called total acidity and I wonder if it has a direct corolation with quality
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
I wonder if there is any connection between percentage of alcohol and the quality
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 8.40 9.50 10.20 10.42 11.10 14.90
most wines have 9.50% - 11.10% alcohol. Median is 10.2%
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.000 0.090 0.260 0.271 0.420 1.000
132 of wines in the data set have 0 citric acid. as per description of the data set, citric acid can add freshness and flavour to wines. I wonder if it has any affect on the variable “quality”" in this data set and how the two might be connected.The difference between the first quartile and the median is roughly 30 fold. that shows that a large number of wines have a very low amount of citric acid
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## Warning: Stacking not well defined when ymin != 0
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.900 1.900 2.200 2.539 2.600 15.500
There is a huge difference between the max (15.5) and the the 3rd qu. for sugar. That shows that there are outliers towards the end spectrum. using scale-y-log10 will shed a light on outliers and scale-x-log10 will show the normal distribution (bell shaped).
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.01200 0.07000 0.07900 0.08747 0.09000 0.61100
again with chloride we see outliers to the right.transformed the long-tailed data to understand it better.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
another transformation accross y access
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## Warning: Stacking not well defined when ymin != 0
total.sulfur.dioxide seem to be another factor that might have negative affect on the smell and taste specially if it is over 50.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 6.00 22.00 38.00 46.47 62.00 289.00
differce between the mean and median is larger than many other variables. median is 38 and mean is 46.47. there are only 9 samples between 150 and 289.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1.00 7.00 14.00 15.87 21.00 72.00
again the data is skewed in case of free.sulfur.dioxide and I have to do log transformation in order to see the distribution. mean is 15.87 and median is 14 for free.sulfur.oxide.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.3300 0.5500 0.6200 0.6581 0.7300 2.0000
distribution for sulphates amounts in also right-skewed. there are outliers, but the difference between different quartiles is not as stark.
## [1] 1599
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.9901 0.9956 0.9968 0.9967 0.9978 1.0040
the distribution is normal for different densities. with first quartile, median, mean and third quartile very close to each other.
there are 1599 observations (red wine samples) in the dataset and 11 features(fixed.acidity, volatile.acidity, citric.acid, residual.sugar, chloride, free.sulfur.dioxide, total.sulfur.dioxide, density, pH, sulphates, alcohol, quality). most of features except for density, pH and quality are right-skewed. and have some extreme outliers to the right.
As for quality, most wines have been evaluated above average as median is bigger than mean. for most variables median is below mean, most notabaly for total.sulfur.dioxide where if above 50ppm the smell and taste becomes evident, the median is substantially bellow the mean, still 25% of wines have over 62ppm. with most attributes except density, PH, and to some extend alcohol, the varaition within the four quartiles is wide, specially between the min and the max which can be because of outliers.
the main feature of interest in my dataset is quality. I would like to know what features affected the determination of the quality by experts. I suspect total.sulfur.dioxide, residual.sugar, volatile.acidity and citric.acid would have the most effect.
total.sulfur.dioxide, residual.sugar, volatile.acidity and citric.acid are features that I am most interested in, but a looking into other features or a combination of some of them might be of help in effective investigation of the dataset and building a model.
I created a new feature called total.acidity which is the sum of fixed.acidity and volatile.acidity. I will have to examine if it has any connection to the quality and if it improves building a model.
most of the features where right-skewed and I log-transformed them to get a better sense of the data. in case of total.sulfur.dioxide, it was done on the y axis and in case of residual.sugar it was done on both axes separately, as it is both right skewed and it has a wide range of outliers.
## X fixed.acidity volatile.acidity
## X 1.000000000 -0.26848392 -0.008815099
## fixed.acidity -0.268483920 1.00000000 -0.256130895
## volatile.acidity -0.008815099 -0.25613089 1.000000000
## citric.acid -0.153551355 0.67170343 -0.552495685
## residual.sugar -0.031260835 0.11477672 0.001917882
## chlorides -0.119868519 0.09370519 0.061297772
## free.sulfur.dioxide 0.090479643 -0.15379419 -0.010503827
## total.sulfur.dioxide -0.117849669 -0.11318144 0.076470005
## density -0.368372087 0.66804729 0.022026232
## pH 0.136005328 -0.68297819 0.234937294
## sulphates -0.125306999 0.18300566 -0.260986685
## alcohol 0.245122841 -0.06166827 -0.202288027
## quality 0.066452608 0.12405165 -0.390557780
## citric.acid residual.sugar chlorides
## X -0.15355136 -0.031260835 -0.119868519
## fixed.acidity 0.67170343 0.114776724 0.093705186
## volatile.acidity -0.55249568 0.001917882 0.061297772
## citric.acid 1.00000000 0.143577162 0.203822914
## residual.sugar 0.14357716 1.000000000 0.055609535
## chlorides 0.20382291 0.055609535 1.000000000
## free.sulfur.dioxide -0.06097813 0.187048995 0.005562147
## total.sulfur.dioxide 0.03553302 0.203027882 0.047400468
## density 0.36494718 0.355283371 0.200632327
## pH -0.54190414 -0.085652422 -0.265026131
## sulphates 0.31277004 0.005527121 0.371260481
## alcohol 0.10990325 0.042075437 -0.221140545
## quality 0.22637251 0.013731637 -0.128906560
## free.sulfur.dioxide total.sulfur.dioxide density
## X 0.090479643 -0.11784967 -0.36837209
## fixed.acidity -0.153794193 -0.11318144 0.66804729
## volatile.acidity -0.010503827 0.07647000 0.02202623
## citric.acid -0.060978129 0.03553302 0.36494718
## residual.sugar 0.187048995 0.20302788 0.35528337
## chlorides 0.005562147 0.04740047 0.20063233
## free.sulfur.dioxide 1.000000000 0.66766645 -0.02194583
## total.sulfur.dioxide 0.667666450 1.00000000 0.07126948
## density -0.021945831 0.07126948 1.00000000
## pH 0.070377499 -0.06649456 -0.34169933
## sulphates 0.051657572 0.04294684 0.14850641
## alcohol -0.069408354 -0.20565394 -0.49617977
## quality -0.050656057 -0.18510029 -0.17491923
## pH sulphates alcohol quality
## X 0.13600533 -0.125306999 0.24512284 0.06645261
## fixed.acidity -0.68297819 0.183005664 -0.06166827 0.12405165
## volatile.acidity 0.23493729 -0.260986685 -0.20228803 -0.39055778
## citric.acid -0.54190414 0.312770044 0.10990325 0.22637251
## residual.sugar -0.08565242 0.005527121 0.04207544 0.01373164
## chlorides -0.26502613 0.371260481 -0.22114054 -0.12890656
## free.sulfur.dioxide 0.07037750 0.051657572 -0.06940835 -0.05065606
## total.sulfur.dioxide -0.06649456 0.042946836 -0.20565394 -0.18510029
## density -0.34169933 0.148506412 -0.49617977 -0.17491923
## pH 1.00000000 -0.196647602 0.20563251 -0.05773139
## sulphates -0.19664760 1.000000000 0.09359475 0.25139708
## alcohol 0.20563251 0.093594750 1.00000000 0.47616632
## quality -0.05773139 0.251397079 0.47616632 1.00000000
For quality, the strongest positive corrolation is seen with alcohol and a weak corrolation with sulphates and citric acid. There is a negative corrolation between quality and volatile acidity and a weak negative corrolation with total sulfur dioxide and chloride.There is a strong corrolation between density and fixed acidity and within pH and fixed acidity.
Using scatterplot to see relation-ship between fixed.acidity, pH, density and citric acid.
## Warning: Removed 49 rows containing missing values (geom_point).
as citric acid increases, the variation in fixed acidity increases. The relation between the two seem to be linear.
Above we can see the linear relation between the two variables more clearly and also the increase of variation.
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.000 0.090 0.260 0.271 0.420 1.000
## Warning: position_stack requires non-overlapping x intervals
## wines$quality: 3
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.0000 0.0050 0.0350 0.1710 0.3275 0.6600
## --------------------------------------------------------
## wines$quality: 4
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.0000 0.0300 0.0900 0.1742 0.2700 1.0000
## --------------------------------------------------------
## wines$quality: 5
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.0000 0.0900 0.2300 0.2437 0.3600 0.7900
## --------------------------------------------------------
## wines$quality: 6
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.0000 0.0900 0.2600 0.2738 0.4300 0.7800
## --------------------------------------------------------
## wines$quality: 7
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.0000 0.3050 0.4000 0.3752 0.4900 0.7600
## --------------------------------------------------------
## wines$quality: 8
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.0300 0.3025 0.4200 0.3911 0.5300 0.7200
it seems like most of the wines with higher quality have a higher level of citric acid.
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 8.40 9.50 10.20 10.42 11.10 14.90
## Warning: position_stack requires non-overlapping x intervals
there is a corrolation between amount of alcohol and quality and there is no low alcohol wine with high quality let’s see them in numbers.
## Warning: Removed 1 rows containing non-finite values (stat_summary).
##
## 9.2 9.5 9.7 9.8
## 2 2 2 2
## 9.9 10 10.1 10.2
## 4 9 2 4
## 10.3 10.4 10.5 10.55
## 1 1 10 1
## 10.6 10.7 10.8 10.9
## 6 1 11 5
## 11 11.1 11.2 11.3
## 13 4 10 8
## 11.4 11.5 11.6 11.7
## 3 6 6 13
## 11.8 11.9 12 12.1
## 11 5 9 8
## 12.2 12.3 12.4 12.5
## 4 7 6 10
## 12.6 12.7 12.8 12.9
## 3 3 8 4
## 13 13.1 13.3 13.4
## 2 1 1 2
## 13.5666666666667 13.6 14
## 1 3 3
wines with higher alcohol have usually higher quality
most wines have quality which is 5 and 6.
## wines$quality: 3
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 8.400 9.725 9.925 9.955 10.580 11.000
## --------------------------------------------------------
## wines$quality: 4
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 9.00 9.60 10.00 10.27 11.00 13.10
## --------------------------------------------------------
## wines$quality: 5
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 8.5 9.4 9.7 9.9 10.2 14.9
## --------------------------------------------------------
## wines$quality: 6
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 8.40 9.80 10.50 10.63 11.30 14.00
## --------------------------------------------------------
## wines$quality: 7
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 9.20 10.80 11.50 11.47 12.10 14.00
## --------------------------------------------------------
## wines$quality: 8
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 9.80 11.32 12.15 12.09 12.88 14.00
highest quality wines (8) have the highest median and the lowest quality wines which are labeled at 3 have the lowest amount of alcohol, except for the ones that are scored at 5.
## wines$quality: 3
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 3.160 3.312 3.390 3.398 3.495 3.630
## --------------------------------------------------------
## wines$quality: 4
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 2.740 3.300 3.370 3.382 3.500 3.900
## --------------------------------------------------------
## wines$quality: 5
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 2.880 3.200 3.300 3.305 3.400 3.740
## --------------------------------------------------------
## wines$quality: 6
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 2.860 3.220 3.320 3.318 3.410 4.010
## --------------------------------------------------------
## wines$quality: 7
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 2.920 3.200 3.280 3.291 3.380 3.780
## --------------------------------------------------------
## wines$quality: 8
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 2.880 3.162 3.230 3.267 3.350 3.720
I see a weak trend towards more basic wines having higher quality score. although the corrolation is very weak we can see that the median for wines with quality 8 is highest.
## wines$quality: 3
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.4400 0.6475 0.8450 0.8845 1.0100 1.5800
## --------------------------------------------------------
## wines$quality: 4
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.230 0.530 0.670 0.694 0.870 1.130
## --------------------------------------------------------
## wines$quality: 5
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.180 0.460 0.580 0.577 0.670 1.330
## --------------------------------------------------------
## wines$quality: 6
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.1600 0.3800 0.4900 0.4975 0.6000 1.0400
## --------------------------------------------------------
## wines$quality: 7
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.1200 0.3000 0.3700 0.4039 0.4850 0.9150
## --------------------------------------------------------
## wines$quality: 8
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.2600 0.3350 0.3700 0.4233 0.4725 0.8500
I see a relatively stronger corrolation between volatile acidity and quality(a negative one)
there seem to be positive corrolation between the two vairables citric acid and density. But they corrolations with quality seem to be opposite to one an other.
## Warning: Removed 162 rows containing non-finite values (stat_smooth).
## Warning: Removed 162 rows containing missing values (geom_point).
the relationship between citric.acid and density seem to be linear but it’s week and datapoints are very dispersed (there is a big variation)
##
## Calls:
## m1: lm(formula = density ~ citric.acid, data = subset(wines, citric.acid >
## 0 & citric.acid <= quantile(wines$citric.acid, 0.999)))
##
## =============================
## (Intercept) 0.996***
## (0.000)
## citric.acid 0.004***
## (0.000)
## -----------------------------
## R-squared 0.1
## adj. R-squared 0.1
## sigma 0.0
## F 210.8
## p 0.0
## Log-likelihood 7229.2
## Deviance 0.0
## AIC -14452.3
## BIC -14436.5
## N 1465
## =============================
the model trained based on citric.acid to explain density, explains only 10% of variance which is negligble.
there is a corrolation between density and fixed.acidity. the higher the fixed.acidity, the higher the density.
##
## Calls:
## m2: lm(formula = quality ~ alcohol, data = subset(wines, alcohol >
## 0 & alcohol <= quantile(wines$alcohol, 0.999)))
##
## =============================
## (Intercept) 1.818***
## (0.175)
## alcohol 0.366***
## (0.017)
## -----------------------------
## R-squared 0.2
## adj. R-squared 0.2
## sigma 0.7
## F 480.4
## p 0.0
## Log-likelihood -1715.4
## Deviance 800.7
## AIC 3436.8
## BIC 3452.9
## N 1598
## =============================
despite corrolation of 0.47 between alcohol and quality, the model only explains 20% of variance of quality
##
## Calls:
## m3: lm(formula = quality ~ volatile.acidity, data = wines)
##
## ===============================
## (Intercept) 6.566***
## (0.058)
## volatile.acidity -1.761***
## (0.104)
## -------------------------------
## R-squared 0.2
## adj. R-squared 0.2
## sigma 0.7
## F 287.4
## p 0.0
## Log-likelihood -1794.3
## Deviance 883.2
## AIC 3594.6
## BIC 3610.8
## N 1599
## ===============================
only 20% of variance explained here. Perhaps I should add more features to the model in the next part.
There is a moderate corrolation between quality and volatile.acidity. There is a stronger corrolation between quality and alcohol and a weaker one with citric.acid and sulphates and density.
There are, as one would expect, stronger corrolations between features that are related such as fixed.acidity and pH (pH is a measurement of acidity).
wines with higher amount of citric acid, alcohol and sulphates are likelier to have a higher quality. and the corrolation with volatile.acidity seem to be negative.
most wines have quality of 5 or 6 (80-90%).
the variation of all features is large and corrolations except for features which are basically related by nature such as acidity and pH, are week. The scatter plots also seem to be really scatterd.
wines with higher acid citric seem to have a higher density.
using R2 to explain variance in quality based on one feature does not seem to give a good result. In next section I will use more than one feature and see if there is any improvements.
there is a corrolation between free.sulfur.dioxide and total.sulfur.dioxide and it is understandable because one is subset of the other. also between acid.citric and density. there is even a stronger one between density and fixed acidity.
the strongest relationship is between fixed.acidity and pH. the higher the fixed.acidity, the lower the pH. There is also a strong corrolation between density and fixed.acidity. there are not very strong relation between any of them and quality.
I did the second plot with only the top and lowest quality to make the distinction more clearly. The first plot is for all different qualities. It seems that comparing between the lowest quality and the higest, for the same amount of sulfate the wines have lower pH.
As expected with alcohol, the higher the alcohol for the same amount of sulfate the quality seems to be higher.
The general trend seem to be for wines with higher volatile.acidity seem to have lower quality. this corresponds with the corrolation results. we can see for higer qualities higher alcohol seem to be compensating for higher volatile.acidity.
## wines$quality: 3
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.00000 0.04975 0.35250 1.56300 3.21400 5.94000
## --------------------------------------------------------
## wines$quality: 4
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.000 0.270 0.882 1.757 2.700 9.400
## --------------------------------------------------------
## wines$quality: 5
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.000 0.950 2.185 2.412 3.572 10.270
## --------------------------------------------------------
## wines$quality: 6
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.000 0.972 2.764 2.923 4.654 9.112
## --------------------------------------------------------
## wines$quality: 7
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.000 3.310 4.720 4.288 5.685 9.880
## --------------------------------------------------------
## wines$quality: 8
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.420 3.375 5.116 4.624 6.160 8.978
the product of the two positively corrolated features seem to demonstrate their affect in quality more clearly.
## wines$quality: 3
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 4.360 5.171 5.320 5.637 5.681 8.514
## --------------------------------------------------------
## wines$quality: 4
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 3.003 4.753 5.871 6.092 6.612 18.800
## --------------------------------------------------------
## wines$quality: 5
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 3.705 5.130 5.723 6.145 6.615 19.400
## --------------------------------------------------------
## wines$quality: 6
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 4.171 5.985 6.763 7.171 8.033 19.300
## --------------------------------------------------------
## wines$quality: 7
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 4.563 7.326 8.378 8.493 9.564 13.560
## --------------------------------------------------------
## wines$quality: 8
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 7.150 8.045 9.322 9.257 10.520 11.480
Corrolation appears here as well.
## wines$quality: 3
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.00000 0.00245 0.01930 0.11220 0.23840 0.37620
## --------------------------------------------------------
## wines$quality: 4
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.0000 0.0144 0.0456 0.1290 0.1408 2.0000
## --------------------------------------------------------
## wines$quality: 5
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.0000 0.0510 0.1292 0.1602 0.2310 0.9576
## --------------------------------------------------------
## wines$quality: 6
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.0000 0.0567 0.1680 0.1925 0.2965 0.9044
## --------------------------------------------------------
## wines$quality: 7
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.0000 0.2008 0.2976 0.2837 0.3893 0.7344
## --------------------------------------------------------
## wines$quality: 8
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.0246 0.2164 0.3424 0.2966 0.3675 0.5904
the difference between the median of product of sulphates and citric.acid at quality 3 and 8 seem to be manyfold.
as I expected, higher quality wines tend to have higher alcohol, which overshadows the affect of higher sulphates which is very weak.
general trend seem to be for higher sulphates, lower volatile.acidity and higher alcohol to have higher quality.
##
## Calls:
## m4: lm(formula = quality ~ alcohol, data = wines)
## m5: lm(formula = quality ~ alcohol + sulphates, data = wines)
## m6: lm(formula = quality ~ alcohol + sulphates + citric.acid, data = wines)
## m7: lm(formula = quality ~ alcohol + sulphates + citric.acid + fixed.acidity,
## data = wines)
## m8: lm(formula = quality ~ alcohol + sulphates + citric.acid + fixed.acidity +
## density, data = wines)
## m9: lm(formula = quality ~ alcohol + sulphates + citric.acid + fixed.acidity +
## density + volatile.acidity, data = wines)
## m10: lm(formula = quality ~ alcohol + sulphates + citric.acid + fixed.acidity +
## density + volatile.acidity + chlorides, data = wines)
## m11: lm(formula = quality ~ alcohol + sulphates + citric.acid + fixed.acidity +
## density + volatile.acidity + chlorides + residual.sugar,
## data = wines)
## m12: lm(formula = quality ~ alcohol + sulphates + citric.acid + fixed.acidity +
## density + volatile.acidity + chlorides + residual.sugar +
## total.sulfur.dioxide, data = wines)
## m13: lm(formula = quality ~ alcohol + sulphates + citric.acid + fixed.acidity +
## density + volatile.acidity + chlorides + residual.sugar +
## total.sulfur.dioxide + pH, data = wines)
##
## ============================================================================================================================================
## m4 m5 m6 m7 m8 m9 m10 m11 m12 m13
## --------------------------------------------------------------------------------------------------------------------------------------------
## (Intercept) 1.875*** 1.375*** 1.434*** 1.138*** 62.356*** 30.401* 31.514* 42.871* 47.404** 25.493
## (0.175) (0.177) (0.176) (0.214) (15.472) (15.163) (15.111) (17.775) (17.729) (21.142)
## alcohol 0.361*** 0.346*** 0.338*** 0.346*** 0.296*** 0.298*** 0.281*** 0.271*** 0.249*** 0.275***
## (0.017) (0.016) (0.016) (0.016) (0.021) (0.020) (0.020) (0.022) (0.023) (0.026)
## sulphates 0.994*** 0.814*** 0.821*** 0.881*** 0.732*** 0.885*** 0.905*** 0.955*** 0.929***
## (0.102) (0.107) (0.106) (0.107) (0.104) (0.112) (0.113) (0.114) (0.114)
## citric.acid 0.513*** 0.312* 0.278* -0.460*** -0.325* -0.344* -0.215 -0.231
## (0.093) (0.125) (0.125) (0.137) (0.141) (0.142) (0.145) (0.145)
## fixed.acidity 0.033* 0.076*** 0.077*** 0.070*** 0.078*** 0.066*** 0.031
## (0.013) (0.017) (0.017) (0.017) (0.018) (0.018) (0.026)
## density -61.296*** -28.268 -29.221 -40.621* -44.859* -21.594
## (15.490) (15.198) (15.145) (17.822) (17.772) (21.575)
## volatile.acidity -1.302*** -1.195*** -1.192*** -1.125*** -1.124***
## (0.116) (0.119) (0.119) (0.120) (0.120)
## chlorides -1.444*** -1.470*** -1.646*** -1.825***
## (0.408) (0.408) (0.409) (0.419)
## residual.sugar 0.017 0.029* 0.020
## (0.014) (0.014) (0.015)
## total.sulfur.dioxide -0.002*** -0.002***
## (0.001) (0.001)
## pH -0.361
## (0.190)
## --------------------------------------------------------------------------------------------------------------------------------------------
## R-squared 0.2 0.3 0.3 0.3 0.3 0.3 0.4 0.4 0.4 0.4
## adj. R-squared 0.2 0.3 0.3 0.3 0.3 0.3 0.3 0.3 0.4 0.4
## sigma 0.7 0.7 0.7 0.7 0.7 0.7 0.7 0.7 0.6 0.6
## F 468.3 295.0 210.5 159.8 132.2 140.0 122.6 107.5 98.2 88.9
## p 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
## Log-likelihood -1721.1 -1675.1 -1660.0 -1657.0 -1649.2 -1587.9 -1581.6 -1580.9 -1573.0 -1571.2
## Deviance 805.9 760.9 746.6 743.9 736.6 682.2 676.9 676.3 669.6 668.1
## AIC 3448.1 3358.3 3329.9 3326.1 3312.5 3191.8 3181.2 3181.8 3168.0 3166.3
## BIC 3464.2 3379.8 3356.8 3358.4 3350.1 3234.8 3229.6 3235.5 3227.1 3230.9
## N 1599 1599 1599 1599 1599 1599 1599 1599 1599 1599
## ============================================================================================================================================
seems to be a poor model. the maximum R-squared reached, including many features, is 0.4.
there are obvious corrolations between pH and fixed.acidity and free.sulfur.dioxide and free sulfur.dioxide. although, there is a week corrolation between quality and sulphates, citric.acid and chlorides, there are many data points/samples that do not seem to have any corrolation between the features. for instance there are a lot of fluctuations in the line plot for sulfates vs. alcohol for different qualities.
there were a couple of them namely the relationship between alcohol and density. wines with higher alcohol seem to have on average lower density and there is a very week negative corrolation between density and quality.
I created a linear model using quality and alcohol. alcohol only described 0.2 of variance in quality. by adding different feature, the R-squared was raised to 0.4.
The majority of samples are of quality 5 or 6.
The largest corrolation is seen between density and volatile.acidity. Higher alcohol seem to correspond to higher quality as well.
Facet wrapping wines by Quality and filling with alcohol and using volatile.acidity as x axis, shows that for higher qualities there are more of wines with higher alcohol and also counts for wines with higher alcohol are generally higher. ALso it shows that wines with higher quality have lower volatile acidity. Among wines with quality of 3 there is no sample with alcohol higher that 11.
The red wine dataset contains 1599 observations with 13 features. Except One feature, quality, the rest are measurable, chemical specifications of wine. quality is an abstract and non-measurable feature that is sensory and based on experts opinions. I started by examining each feature and looking at their distribution, which for most part was normal. Then tried to find the relation between different features and specially all features with the outcome feature, quality. According to corrolation figures obtained by function cor(wines), the features that are corrolated with quality are alcohol, volatile.acidity, sulphates and citric.acid. Most of whom are weekly corrolated. Further observations more or less confirmed the relations. There are a few features that are related by nature and definition, such as pH and citric.acid, as pH is the measurement of acidity. I tried to fit a linear model to the data. The best outcome was 40% of variance of quality being accounted for, through a model, using most of variables, Although it is not a very high rate, but given the sensory nature of outcome variable, quality, it is a good starting point to build more accurate models. Building non-linear models that can for instance include different orders of polynomials and collecting more data and or looking into more complete datasets are two of the options. Using a classification model instead of regression might be a good choice, as predicting wines quality in this context seems to be more of a classification problem.
## [1] 1599 13
## [1] "X" "fixed.acidity" "volatile.acidity"
## [4] "citric.acid" "residual.sugar" "chlorides"
## [7] "free.sulfur.dioxide" "total.sulfur.dioxide" "density"
## [10] "pH" "sulphates" "alcohol"
## [13] "quality"
## Warning in data(wines): data set 'wines' not found
## 'data.frame': 1599 obs. of 13 variables:
## $ X : int 1 2 3 4 5 6 7 8 9 10 ...
## $ fixed.acidity : num 7.4 7.8 7.8 11.2 7.4 7.4 7.9 7.3 7.8 7.5 ...
## $ volatile.acidity : num 0.7 0.88 0.76 0.28 0.7 0.66 0.6 0.65 0.58 0.5 ...
## $ citric.acid : num 0 0 0.04 0.56 0 0 0.06 0 0.02 0.36 ...
## $ residual.sugar : num 1.9 2.6 2.3 1.9 1.9 1.8 1.6 1.2 2 6.1 ...
## $ chlorides : num 0.076 0.098 0.092 0.075 0.076 0.075 0.069 0.065 0.073 0.071 ...
## $ free.sulfur.dioxide : num 11 25 15 17 11 13 15 15 9 17 ...
## $ total.sulfur.dioxide: num 34 67 54 60 34 40 59 21 18 102 ...
## $ density : num 0.998 0.997 0.997 0.998 0.998 ...
## $ pH : num 3.51 3.2 3.26 3.16 3.51 3.51 3.3 3.39 3.36 3.35 ...
## $ sulphates : num 0.56 0.68 0.65 0.58 0.56 0.56 0.46 0.47 0.57 0.8 ...
## $ alcohol : num 9.4 9.8 9.8 9.8 9.4 9.4 9.4 10 9.5 10.5 ...
## $ quality : int 5 5 5 6 5 5 5 7 7 5 ...
## NULL
## [1] 7.4 7.8 11.2 7.9 7.3 7.5 6.7 5.6 8.9 8.5 8.1 7.6 6.9 6.3
## [15] 7.1 8.3 5.2 5.7 8.8 6.8 4.6 7.7 8.7 6.4 6.6 8.6 10.2 7.0
## [29] 7.2 9.3 8.0 9.7 6.2 5.0 4.7 8.4 10.1 9.4 9.0 8.2 6.1 5.8
## [43] 9.2 11.5 5.4 9.6 12.8 11.0 11.6 12.0 15.0 10.8 11.1 10.0 12.5 11.8
## [57] 10.9 10.3 11.4 9.9 10.4 13.3 10.6 9.8 13.4 10.7 11.9 12.4 12.2 13.8
## [71] 9.1 13.5 10.5 12.6 14.0 13.7 9.5 12.7 12.3 15.6 5.3 11.3 13.0 6.5
## [85] 12.9 14.3 15.5 11.7 13.2 15.9 12.1 5.1 4.9 5.9 6.0 5.5
## [1] 5 6 7 4 8 3
## [1] 1.90 2.60 2.30 1.80 1.60 1.20 2.00 6.10 3.80 3.90 1.70
## [12] 4.40 2.40 1.40 2.50 10.70 5.50 2.10 1.50 5.90 2.80 2.20
## [23] 3.00 3.40 5.10 4.65 1.30 7.30 7.20 2.90 2.70 5.60 3.10
## [34] 3.20 3.30 3.60 4.00 7.00 6.40 3.50 11.00 3.65 4.50 4.80
## [45] 2.95 5.80 6.20 4.20 7.90 3.70 6.70 6.60 2.15 5.20 2.55
## [56] 15.50 4.10 8.30 6.55 4.60 4.30 5.15 6.30 6.00 8.60 7.50
## [67] 2.25 4.25 2.85 3.45 2.35 2.65 9.00 8.80 5.00 1.65 2.05
## [78] 0.90 8.90 8.10 4.70 1.75 7.80 12.90 13.40 5.40 15.40 3.75
## [89] 13.80 5.70 13.90
## X fixed.acidity volatile.acidity citric.acid
## Min. : 1.0 Min. : 4.60 Min. :0.1200 Min. :0.000
## 1st Qu.: 400.5 1st Qu.: 7.10 1st Qu.:0.3900 1st Qu.:0.090
## Median : 800.0 Median : 7.90 Median :0.5200 Median :0.260
## Mean : 800.0 Mean : 8.32 Mean :0.5278 Mean :0.271
## 3rd Qu.:1199.5 3rd Qu.: 9.20 3rd Qu.:0.6400 3rd Qu.:0.420
## Max. :1599.0 Max. :15.90 Max. :1.5800 Max. :1.000
## residual.sugar chlorides free.sulfur.dioxide
## Min. : 0.900 Min. :0.01200 Min. : 1.00
## 1st Qu.: 1.900 1st Qu.:0.07000 1st Qu.: 7.00
## Median : 2.200 Median :0.07900 Median :14.00
## Mean : 2.539 Mean :0.08747 Mean :15.87
## 3rd Qu.: 2.600 3rd Qu.:0.09000 3rd Qu.:21.00
## Max. :15.500 Max. :0.61100 Max. :72.00
## total.sulfur.dioxide density pH sulphates
## Min. : 6.00 Min. :0.9901 Min. :2.740 Min. :0.3300
## 1st Qu.: 22.00 1st Qu.:0.9956 1st Qu.:3.210 1st Qu.:0.5500
## Median : 38.00 Median :0.9968 Median :3.310 Median :0.6200
## Mean : 46.47 Mean :0.9967 Mean :3.311 Mean :0.6581
## 3rd Qu.: 62.00 3rd Qu.:0.9978 3rd Qu.:3.400 3rd Qu.:0.7300
## Max. :289.00 Max. :1.0037 Max. :4.010 Max. :2.0000
## alcohol quality
## Min. : 8.40 Min. :3.000
## 1st Qu.: 9.50 1st Qu.:5.000
## Median :10.20 Median :6.000
## Mean :10.42 Mean :5.636
## 3rd Qu.:11.10 3rd Qu.:6.000
## Max. :14.90 Max. :8.000
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 3.000 5.000 6.000 5.636 6.000 8.000
##
## 3 4 5 6 7 8
## 10 53 681 638 199 18
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## Warning: Stacking not well defined when ymin != 0
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 8.40 9.50 10.20 10.42 11.10 14.90
##
## 8.4 8.5 8.7 8.8
## 2 1 2 2
## 9 9.05 9.1 9.2
## 30 1 23 72
## 9.23333333333333 9.25 9.3 9.4
## 1 1 59 103
## 9.5 9.55 9.56666666666667 9.6
## 139 2 1 59
## 9.7 9.8 9.9 9.95
## 54 78 49 1
## 10 10.0333333333333 10.1 10.2
## 67 2 47 46
## 10.3 10.4 10.5 10.55
## 33 41 67 2
## 10.6 10.7 10.75 10.8
## 28 27 1 42
## 10.9 11 11.0666666666667 11.1
## 49 59 1 27
## 11.2 11.3 11.4 11.5
## 36 32 32 30
## 11.6 11.7 11.8 11.9
## 15 23 29 20
## 11.95 12 12.1 12.2
## 1 21 13 12
## 12.3 12.4 12.5 12.6
## 12 13 21 6
## 12.7 12.8 12.9 13
## 9 17 9 6
## 13.1 13.2 13.3 13.4
## 2 1 3 3
## 13.5 13.5666666666667 13.6 14
## 1 1 4 7
## 14.9
## 1
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 4.60 7.10 7.90 8.32 9.20 15.90
##
## 4.6 4.7 4.9 5 5.1 5.2 5.3 5.4 5.5 5.6 5.7 5.8 5.9 6 6.1
## 1 1 1 6 4 6 4 5 1 14 2 4 9 13 16
## 6.2 6.3 6.4 6.5 6.6 6.7 6.8 6.9 7 7.1 7.2 7.3 7.4 7.5 7.6
## 20 14 25 17 37 28 46 38 50 57 67 44 44 52 46
## 7.7 7.8 7.9 8 8.1 8.2 8.3 8.4 8.5 8.6 8.7 8.8 8.9 9 9.1
## 49 53 42 42 26 45 40 26 19 27 24 34 33 26 29
## 9.2 9.3 9.4 9.5 9.6 9.7 9.8 9.9 10 10.1 10.2 10.3 10.4 10.5 10.6
## 16 22 17 14 17 9 15 26 23 10 19 11 21 12 14
## 10.7 10.8 10.9 11 11.1 11.2 11.3 11.4 11.5 11.6 11.7 11.8 11.9 12 12.1
## 10 10 8 3 9 5 7 5 13 12 3 3 12 7 1
## 12.2 12.3 12.4 12.5 12.6 12.7 12.8 12.9 13 13.2 13.3 13.4 13.5 13.7 13.8
## 4 5 4 7 4 4 5 2 3 3 3 1 1 2 1
## 14 14.3 15 15.5 15.6 15.9
## 1 1 2 2 2 1
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.1200 0.3900 0.5200 0.5278 0.6400 1.5800
##
## 0.12 0.16 0.18 0.19 0.2 0.21 0.22 0.23 0.24 0.25 0.26 0.27
## 3 2 10 2 3 6 6 5 13 7 16 14
## 0.28 0.29 0.295 0.3 0.305 0.31 0.315 0.32 0.33 0.34 0.35 0.36
## 23 16 1 16 2 30 2 23 20 30 22 38
## 0.365 0.37 0.38 0.39 0.395 0.4 0.41 0.415 0.42 0.43 0.44 0.45
## 2 24 35 35 2 37 33 3 31 43 23 22
## 0.46 0.47 0.475 0.48 0.49 0.5 0.51 0.52 0.53 0.54 0.545 0.55
## 31 21 2 24 35 46 24 33 29 31 5 20
## 0.56 0.565 0.57 0.575 0.58 0.585 0.59 0.595 0.6 0.605 0.61 0.615
## 34 1 28 3 38 3 39 1 47 3 27 6
## 0.62 0.625 0.63 0.635 0.64 0.645 0.65 0.655 0.66 0.665 0.67 0.675
## 24 3 29 9 27 12 16 7 26 3 23 3
## 0.68 0.685 0.69 0.695 0.7 0.705 0.71 0.715 0.72 0.725 0.73 0.735
## 12 11 23 7 10 6 3 12 5 9 6 8
## 0.74 0.745 0.75 0.755 0.76 0.765 0.77 0.775 0.78 0.785 0.79 0.795
## 11 5 6 3 5 5 6 4 10 8 2 2
## 0.8 0.805 0.81 0.815 0.82 0.825 0.83 0.835 0.84 0.845 0.85 0.855
## 3 1 2 3 5 1 4 4 8 1 2 3
## 0.86 0.865 0.87 0.875 0.88 0.885 0.89 0.895 0.9 0.91 0.915 0.92
## 2 1 4 2 5 5 1 1 3 3 4 1
## 0.935 0.95 0.955 0.96 0.965 0.975 0.98 1 1.005 1.01 1.02 1.025
## 2 1 1 3 3 1 3 3 1 1 4 1
## 1.035 1.04 1.07 1.09 1.115 1.13 1.18 1.185 1.24 1.33 1.58
## 1 3 1 1 1 1 1 1 1 2 1
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 8.40 9.50 10.20 10.42 11.10 14.90
##
## 8.4 8.5 8.7 8.8
## 2 1 2 2
## 9 9.05 9.1 9.2
## 30 1 23 72
## 9.23333333333333 9.25 9.3 9.4
## 1 1 59 103
## 9.5 9.55 9.56666666666667 9.6
## 139 2 1 59
## 9.7 9.8 9.9 9.95
## 54 78 49 1
## 10 10.0333333333333 10.1 10.2
## 67 2 47 46
## 10.3 10.4 10.5 10.55
## 33 41 67 2
## 10.6 10.7 10.75 10.8
## 28 27 1 42
## 10.9 11 11.0666666666667 11.1
## 49 59 1 27
## 11.2 11.3 11.4 11.5
## 36 32 32 30
## 11.6 11.7 11.8 11.9
## 15 23 29 20
## 11.95 12 12.1 12.2
## 1 21 13 12
## 12.3 12.4 12.5 12.6
## 12 13 21 6
## 12.7 12.8 12.9 13
## 9 17 9 6
## 13.1 13.2 13.3 13.4
## 2 1 3 3
## 13.5 13.5666666666667 13.6 14
## 1 1 4 7
## 14.9
## 1
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.000 0.090 0.260 0.271 0.420 1.000
##
## 0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.1 0.11 0.12 0.13 0.14
## 132 33 50 30 29 20 24 22 33 30 35 15 27 18 21
## 0.15 0.16 0.17 0.18 0.19 0.2 0.21 0.22 0.23 0.24 0.25 0.26 0.27 0.28 0.29
## 19 9 16 22 21 25 33 27 25 51 27 38 20 19 21
## 0.3 0.31 0.32 0.33 0.34 0.35 0.36 0.37 0.38 0.39 0.4 0.41 0.42 0.43 0.44
## 30 30 32 25 24 13 20 19 14 28 29 16 29 15 23
## 0.45 0.46 0.47 0.48 0.49 0.5 0.51 0.52 0.53 0.54 0.55 0.56 0.57 0.58 0.59
## 22 19 18 23 68 20 13 17 14 13 12 8 9 9 8
## 0.6 0.61 0.62 0.63 0.64 0.65 0.66 0.67 0.68 0.69 0.7 0.71 0.72 0.73 0.74
## 9 2 1 10 9 7 14 2 11 4 2 1 1 3 4
## 0.75 0.76 0.78 0.79 1
## 1 3 1 1 1
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## Warning: Stacking not well defined when ymin != 0
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.900 1.900 2.200 2.539 2.600 15.500
##
## 0.9 1.2 1.3 1.4 1.5 1.6 1.65 1.7 1.75 1.8 1.9 2 2.05 2.1 2.15
## 2 8 5 35 30 58 2 76 2 129 117 156 2 128 2
## 2.2 2.25 2.3 2.35 2.4 2.5 2.55 2.6 2.65 2.7 2.8 2.85 2.9 2.95 3
## 131 1 109 1 86 84 1 79 1 39 49 1 24 1 25
## 3.1 3.2 3.3 3.4 3.45 3.5 3.6 3.65 3.7 3.75 3.8 3.9 4 4.1 4.2
## 7 15 11 15 1 2 8 1 4 1 8 6 11 6 5
## 4.25 4.3 4.4 4.5 4.6 4.65 4.7 4.8 5 5.1 5.15 5.2 5.4 5.5 5.6
## 1 8 4 4 6 2 1 3 1 5 1 3 1 8 6
## 5.7 5.8 5.9 6 6.1 6.2 6.3 6.4 6.55 6.6 6.7 7 7.2 7.3 7.5
## 1 4 3 4 4 3 2 3 2 2 2 1 1 1 1
## 7.8 7.9 8.1 8.3 8.6 8.8 8.9 9 10.7 11 12.9 13.4 13.8 13.9 15.4
## 2 3 2 3 1 2 1 1 1 2 1 1 2 1 2
## 15.5
## 1
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.01200 0.07000 0.07900 0.08747 0.09000 0.61100
##
## 0.012 0.034 0.038 0.039 0.041 0.042 0.043 0.044 0.045 0.046 0.047 0.048
## 2 1 2 4 4 3 1 5 4 4 4 8
## 0.049 0.05 0.051 0.052 0.053 0.054 0.055 0.056 0.057 0.058 0.059 0.06
## 8 12 1 10 5 13 8 9 10 14 17 16
## 0.061 0.062 0.063 0.064 0.065 0.066 0.067 0.068 0.069 0.07 0.071 0.072
## 11 24 22 20 23 32 27 30 21 35 47 24
## 0.073 0.074 0.075 0.076 0.077 0.078 0.079 0.08 0.081 0.082 0.083 0.084
## 35 55 45 51 47 51 43 66 40 46 35 49
## 0.085 0.086 0.087 0.088 0.089 0.09 0.091 0.092 0.093 0.094 0.095 0.096
## 25 31 25 32 25 21 19 22 21 19 23 18
## 0.097 0.098 0.099 0.1 0.101 0.102 0.103 0.104 0.105 0.106 0.107 0.108
## 18 12 8 13 5 10 7 16 6 8 9 1
## 0.109 0.11 0.111 0.112 0.113 0.114 0.115 0.116 0.117 0.118 0.119 0.12
## 3 8 7 6 1 11 5 2 4 8 3 3
## 0.121 0.122 0.123 0.124 0.125 0.126 0.127 0.128 0.132 0.136 0.137 0.143
## 2 7 6 3 1 1 1 1 4 1 1 1
## 0.145 0.146 0.147 0.148 0.152 0.153 0.157 0.159 0.161 0.165 0.166 0.168
## 1 1 1 1 2 1 3 1 1 1 3 1
## 0.169 0.17 0.171 0.172 0.174 0.176 0.178 0.186 0.19 0.194 0.2 0.205
## 1 1 2 1 1 1 2 1 1 1 1 2
## 0.213 0.214 0.216 0.222 0.226 0.23 0.235 0.236 0.241 0.243 0.25 0.263
## 1 3 1 1 2 1 1 1 1 1 1 1
## 0.267 0.27 0.332 0.337 0.341 0.343 0.358 0.36 0.368 0.369 0.387 0.401
## 1 1 1 1 1 1 1 1 1 1 1 1
## 0.403 0.413 0.414 0.415 0.422 0.464 0.467 0.61 0.611
## 1 1 2 3 1 1 1 1 1
transformed the long-tailed data to understand it better.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
###another transformation accross y access
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## Warning: Stacking not well defined when ymin != 0
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 6.00 22.00 38.00 46.47 62.00 289.00
##
## 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
## 3 4 14 14 27 26 29 28 33 35 26 27 35 29 33
## 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35
## 25 25 34 36 27 24 30 43 20 14 32 20 17 20 26
## 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50
## 12 26 31 16 17 14 26 18 23 20 17 24 21 21 11
## 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65
## 11 15 14 20 13 10 6 14 9 18 9 9 13 10 17
## 66 67 68 69 70 71 72 73 74 75 76 77 77.5 78 79
## 9 12 10 8 8 7 10 7 8 5 3 8 2 4 5
## 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94
## 4 6 4 2 6 9 10 6 14 9 5 7 8 2 8
## 95 96 98 99 100 101 102 103 104 105 106 108 109 110 111
## 4 5 7 6 3 4 6 2 5 5 6 3 4 6 3
## 112 113 114 115 116 119 120 121 122 124 125 126 127 128 129
## 3 4 2 2 1 7 2 4 3 3 2 1 2 2 3
## 130 131 133 134 135 136 139 140 141 142 143 144 145 147 148
## 1 3 3 2 2 2 1 1 3 1 2 3 3 3 2
## 149 151 152 153 155 160 165 278 289
## 1 2 1 1 1 1 1 1 1
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1.00 7.00 14.00 15.87 21.00 72.00
##
## 1 2 3 4 5 5.5 6 7 8 9 10 11 12 13 14
## 3 1 49 41 104 1 138 71 56 62 79 59 75 57 50
## 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29
## 78 61 60 46 39 30 41 22 32 34 24 32 29 23 23
## 30 31 32 33 34 35 36 37 37.5 38 39 40 40.5 41 42
## 16 20 22 11 18 15 11 3 2 9 5 6 1 7 3
## 43 45 46 47 48 50 51 52 53 54 55 57 66 68 72
## 3 3 1 1 4 2 4 3 1 1 2 1 1 2 1
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.3300 0.5500 0.6200 0.6581 0.7300 2.0000
##
## 0.33 0.37 0.39 0.4 0.42 0.43 0.44 0.45 0.46 0.47 0.48 0.49 0.5 0.51 0.52
## 1 2 6 4 5 8 16 12 18 19 29 31 27 26 47
## 0.53 0.54 0.55 0.56 0.57 0.58 0.59 0.6 0.61 0.62 0.63 0.64 0.65 0.66 0.67
## 51 68 50 60 55 68 51 69 45 61 48 46 41 42 36
## 0.68 0.69 0.7 0.71 0.72 0.73 0.74 0.75 0.76 0.77 0.78 0.79 0.8 0.81 0.82
## 35 23 33 26 28 26 26 20 25 26 23 18 19 15 22
## 0.83 0.84 0.85 0.86 0.87 0.88 0.89 0.9 0.91 0.92 0.93 0.94 0.95 0.96 0.97
## 15 13 14 13 13 7 7 8 8 5 10 4 2 3 6
## 0.98 0.99 1 1.01 1.02 1.03 1.04 1.05 1.06 1.07 1.08 1.09 1.1 1.11 1.12
## 2 3 1 1 3 2 2 3 4 2 3 1 2 1 1
## 1.13 1.14 1.15 1.16 1.17 1.18 1.2 1.22 1.26 1.28 1.31 1.33 1.34 1.36 1.56
## 2 2 1 1 5 3 1 1 1 2 1 1 1 3 1
## 1.59 1.61 1.62 1.95 1.98 2
## 1 1 1 2 1 1
## [1] 1599
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.9901 0.9956 0.9968 0.9967 0.9978 1.0040
##
## 0.99007 0.9902 0.99064 0.9908 0.99084 0.9912 0.9915 0.99154 0.99157
## 2 1 2 1 1 1 1 1 1
## 0.9916 0.99162 0.9917 0.99182 0.99191 0.9921 0.9922 0.99235 0.99236
## 2 1 1 2 1 1 2 1 1
## 0.9924 0.99242 0.99252 0.99256 0.99258 0.99264 0.9927 0.9928 0.99286
## 3 2 1 1 3 1 1 2 1
## 0.9929 0.99292 0.99294 0.99306 0.99314 0.99316 0.99318 0.9932 0.99322
## 1 1 2 1 1 2 1 1 1
## 0.99323 0.99328 0.9933 0.99331 0.99332 0.99334 0.99336 0.9934 0.99341
## 1 1 1 2 1 1 1 4 1
## 0.99344 0.99346 0.99348 0.9935 0.99352 0.99354 0.99356 0.99357 0.99358
## 1 3 1 1 2 2 4 1 3
## 0.9936 0.99362 0.99364 0.9937 0.99371 0.99374 0.99376 0.99378 0.99379
## 2 2 1 2 2 2 3 3 1
## 0.9938 0.99384 0.99385 0.99386 0.99387 0.99388 0.99392 0.99394 0.99395
## 1 1 1 1 1 2 2 1 1
## 0.99396 0.99397 0.994 0.99402 0.99408 0.9941 0.99414 0.99416 0.99417
## 3 1 2 4 3 1 2 1 1
## 0.99418 0.99419 0.9942 0.99425 0.99426 0.99428 0.9943 0.99434 0.99437
## 2 2 3 1 1 1 2 1 1
## 0.99438 0.99439 0.9944 0.99444 0.99448 0.99451 0.99454 0.99456 0.99458
## 5 1 3 4 4 1 1 1 4
## 0.99459 0.9946 0.99462 0.99464 0.99467 0.99468 0.9947 0.99471 0.99472
## 1 5 2 2 2 1 6 3 3
## 0.99473 0.99474 0.99476 0.99478 0.99479 0.9948 0.99483 0.99484 0.99486
## 1 1 3 2 1 9 1 3 1
## 0.99488 0.99489 0.9949 0.99491 0.99492 0.99494 0.99495 0.99496 0.99498
## 4 3 4 1 2 4 2 1 5
## 0.99499 0.995 0.99501 0.99502 0.99504 0.99506 0.99508 0.99509 0.9951
## 1 10 1 2 2 1 3 1 4
## 0.99512 0.99514 0.99516 0.99517 0.99518 0.99519 0.9952 0.99521 0.99522
## 2 5 6 1 3 1 9 1 4
## 0.99523 0.99524 0.99525 0.99526 0.99528 0.99529 0.9953 0.99531 0.99532
## 1 4 2 2 3 1 4 2 1
## 0.99533 0.99534 0.99536 0.99538 0.9954 0.99541 0.99542 0.99543 0.99544
## 1 6 2 11 4 1 1 2 1
## 0.99545 0.99546 0.99547 0.99549 0.9955 0.99551 0.99552 0.99553 0.99554
## 3 7 2 2 14 3 5 1 3
## 0.99555 0.99556 0.99557 0.99558 0.9956 0.99562 0.99564 0.99565 0.99566
## 1 2 3 3 14 4 2 3 4
## 0.99568 0.99569 0.9957 0.99572 0.99573 0.99574 0.99575 0.99576 0.99577
## 4 1 6 9 1 2 2 5 3
## 0.99578 0.9958 0.99581 0.99582 0.99584 0.99585 0.99586 0.99587 0.99588
## 3 14 1 1 2 3 6 2 4
## 0.99589 0.9959 0.99592 0.99593 0.99594 0.99596 0.99598 0.99599 0.996
## 1 13 4 2 1 2 2 2 13
## 0.99603 0.99604 0.99605 0.99606 0.99608 0.99609 0.9961 0.99612 0.99613
## 2 3 3 2 2 1 10 6 4
## 0.99614 0.99615 0.99616 0.99617 0.99619 0.9962 0.99621 0.99622 0.99623
## 2 5 7 1 1 28 1 5 2
## 0.99624 0.99625 0.99627 0.99628 0.99629 0.9963 0.99631 0.99632 0.99633
## 3 3 3 3 2 15 1 4 4
## 0.99634 0.99635 0.99636 0.99638 0.99639 0.9964 0.99641 0.99642 0.99643
## 3 1 5 5 2 25 1 3 1
## 0.99645 0.99646 0.99647 0.99648 0.99649 0.9965 0.99651 0.99652 0.99654
## 1 1 2 3 1 11 1 6 2
## 0.99655 0.99656 0.99658 0.99659 0.9966 0.99661 0.99664 0.99665 0.99666
## 6 5 1 2 23 1 3 1 3
## 0.99667 0.99668 0.99669 0.9967 0.99672 0.99674 0.99675 0.99676 0.99677
## 1 4 2 13 5 2 5 3 2
## 0.99678 0.9968 0.99682 0.99683 0.99684 0.99685 0.99686 0.99688 0.99689
## 1 35 2 2 1 8 3 2 4
## 0.9969 0.99692 0.99693 0.99694 0.99695 0.99697 0.99698 0.99699 0.997
## 18 4 2 3 1 1 1 1 24
## 0.99701 0.99702 0.99704 0.99705 0.99706 0.99708 0.99709 0.9971 0.99712
## 2 4 3 1 2 4 1 13 4
## 0.99713 0.99714 0.99716 0.99717 0.99718 0.99719 0.9972 0.99721 0.99722
## 2 2 2 1 3 1 36 1 1
## 0.99724 0.99725 0.99726 0.99727 0.99728 0.99729 0.9973 0.99732 0.99733
## 4 1 1 1 3 1 18 3 1
## 0.99734 0.99735 0.99736 0.99738 0.99739 0.9974 0.99743 0.99744 0.99745
## 4 6 5 4 1 22 2 2 9
## 0.99746 0.99747 0.99748 0.9975 0.99752 0.99754 0.99756 0.99758 0.9976
## 7 2 3 7 1 1 1 1 35
## 0.99761 0.99764 0.99765 0.99768 0.99769 0.9977 0.99772 0.99774 0.99779
## 1 1 1 3 2 4 1 5 1
## 0.9978 0.99782 0.99783 0.99784 0.99785 0.99786 0.99787 0.99788 0.9979
## 26 2 2 1 1 4 3 2 14
## 0.99791 0.99796 0.99798 0.998 0.99801 0.99803 0.99808 0.9981 0.99814
## 1 1 2 29 2 3 1 10 2
## 0.99815 0.99817 0.99818 0.9982 0.99822 0.99823 0.99824 0.99828 0.9983
## 2 2 3 23 1 1 3 2 9
## 0.99832 0.99834 0.99836 0.9984 0.99842 0.99845 0.9985 0.99852 0.99854
## 1 1 2 20 2 1 3 1 1
## 0.99855 0.99859 0.9986 0.99864 0.99865 0.9987 0.99878 0.9988 0.99888
## 2 1 19 1 2 12 1 20 2
## 0.9989 0.99892 0.999 0.99901 0.9991 0.99914 0.99915 0.99918 0.9992
## 2 3 8 1 10 3 1 1 7
## 0.99922 0.99925 0.9993 0.99935 0.99938 0.99939 0.9994 0.9995 0.9996
## 1 1 4 1 1 1 24 1 12
## 0.99965 0.9997 0.99974 0.99975 0.99976 0.9998 0.9999 1 1.00005
## 1 8 1 1 1 10 1 10 2
## 1.0001 1.00012 1.00015 1.0002 1.00024 1.00025 1.0003 1.0004 1.0006
## 4 1 2 10 1 1 2 9 6
## 1.0008 1.001 1.0014 1.0015 1.0018 1.0021 1.0022 1.00242 1.0026
## 3 6 6 2 1 2 2 2 2
## 1.00289 1.00315 1.0032 1.00369
## 1 3 1 2
## X fixed.acidity volatile.acidity
## X 1.000000000 -0.26848392 -0.008815099
## fixed.acidity -0.268483920 1.00000000 -0.256130895
## volatile.acidity -0.008815099 -0.25613089 1.000000000
## citric.acid -0.153551355 0.67170343 -0.552495685
## residual.sugar -0.031260835 0.11477672 0.001917882
## chlorides -0.119868519 0.09370519 0.061297772
## free.sulfur.dioxide 0.090479643 -0.15379419 -0.010503827
## total.sulfur.dioxide -0.117849669 -0.11318144 0.076470005
## density -0.368372087 0.66804729 0.022026232
## pH 0.136005328 -0.68297819 0.234937294
## sulphates -0.125306999 0.18300566 -0.260986685
## alcohol 0.245122841 -0.06166827 -0.202288027
## quality 0.066452608 0.12405165 -0.390557780
## citric.acid residual.sugar chlorides
## X -0.15355136 -0.031260835 -0.119868519
## fixed.acidity 0.67170343 0.114776724 0.093705186
## volatile.acidity -0.55249568 0.001917882 0.061297772
## citric.acid 1.00000000 0.143577162 0.203822914
## residual.sugar 0.14357716 1.000000000 0.055609535
## chlorides 0.20382291 0.055609535 1.000000000
## free.sulfur.dioxide -0.06097813 0.187048995 0.005562147
## total.sulfur.dioxide 0.03553302 0.203027882 0.047400468
## density 0.36494718 0.355283371 0.200632327
## pH -0.54190414 -0.085652422 -0.265026131
## sulphates 0.31277004 0.005527121 0.371260481
## alcohol 0.10990325 0.042075437 -0.221140545
## quality 0.22637251 0.013731637 -0.128906560
## free.sulfur.dioxide total.sulfur.dioxide density
## X 0.090479643 -0.11784967 -0.36837209
## fixed.acidity -0.153794193 -0.11318144 0.66804729
## volatile.acidity -0.010503827 0.07647000 0.02202623
## citric.acid -0.060978129 0.03553302 0.36494718
## residual.sugar 0.187048995 0.20302788 0.35528337
## chlorides 0.005562147 0.04740047 0.20063233
## free.sulfur.dioxide 1.000000000 0.66766645 -0.02194583
## total.sulfur.dioxide 0.667666450 1.00000000 0.07126948
## density -0.021945831 0.07126948 1.00000000
## pH 0.070377499 -0.06649456 -0.34169933
## sulphates 0.051657572 0.04294684 0.14850641
## alcohol -0.069408354 -0.20565394 -0.49617977
## quality -0.050656057 -0.18510029 -0.17491923
## pH sulphates alcohol quality
## X 0.13600533 -0.125306999 0.24512284 0.06645261
## fixed.acidity -0.68297819 0.183005664 -0.06166827 0.12405165
## volatile.acidity 0.23493729 -0.260986685 -0.20228803 -0.39055778
## citric.acid -0.54190414 0.312770044 0.10990325 0.22637251
## residual.sugar -0.08565242 0.005527121 0.04207544 0.01373164
## chlorides -0.26502613 0.371260481 -0.22114054 -0.12890656
## free.sulfur.dioxide 0.07037750 0.051657572 -0.06940835 -0.05065606
## total.sulfur.dioxide -0.06649456 0.042946836 -0.20565394 -0.18510029
## density -0.34169933 0.148506412 -0.49617977 -0.17491923
## pH 1.00000000 -0.196647602 0.20563251 -0.05773139
## sulphates -0.19664760 1.000000000 0.09359475 0.25139708
## alcohol 0.20563251 0.093594750 1.00000000 0.47616632
## quality -0.05773139 0.251397079 0.47616632 1.00000000
## Warning: Removed 49 rows containing missing values (geom_point).
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.000 0.090 0.260 0.271 0.420 1.000
## Warning: position_stack requires non-overlapping x intervals
## wines$quality: 3
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.0000 0.0050 0.0350 0.1710 0.3275 0.6600
## --------------------------------------------------------
## wines$quality: 4
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.0000 0.0300 0.0900 0.1742 0.2700 1.0000
## --------------------------------------------------------
## wines$quality: 5
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.0000 0.0900 0.2300 0.2437 0.3600 0.7900
## --------------------------------------------------------
## wines$quality: 6
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.0000 0.0900 0.2600 0.2738 0.4300 0.7800
## --------------------------------------------------------
## wines$quality: 7
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.0000 0.3050 0.4000 0.3752 0.4900 0.7600
## --------------------------------------------------------
## wines$quality: 8
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.0300 0.3025 0.4200 0.3911 0.5300 0.7200
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 8.40 9.50 10.20 10.42 11.10 14.90
## Warning: position_stack requires non-overlapping x intervals
## Warning: Removed 1 rows containing non-finite values (stat_summary).
##
## 9.2 9.5 9.7 9.8
## 2 2 2 2
## 9.9 10 10.1 10.2
## 4 9 2 4
## 10.3 10.4 10.5 10.55
## 1 1 10 1
## 10.6 10.7 10.8 10.9
## 6 1 11 5
## 11 11.1 11.2 11.3
## 13 4 10 8
## 11.4 11.5 11.6 11.7
## 3 6 6 13
## 11.8 11.9 12 12.1
## 11 5 9 8
## 12.2 12.3 12.4 12.5
## 4 7 6 10
## 12.6 12.7 12.8 12.9
## 3 3 8 4
## 13 13.1 13.3 13.4
## 2 1 1 2
## 13.5666666666667 13.6 14
## 1 3 3
## wines$quality: 3
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 8.400 9.725 9.925 9.955 10.580 11.000
## --------------------------------------------------------
## wines$quality: 4
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 9.00 9.60 10.00 10.27 11.00 13.10
## --------------------------------------------------------
## wines$quality: 5
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 8.5 9.4 9.7 9.9 10.2 14.9
## --------------------------------------------------------
## wines$quality: 6
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 8.40 9.80 10.50 10.63 11.30 14.00
## --------------------------------------------------------
## wines$quality: 7
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 9.20 10.80 11.50 11.47 12.10 14.00
## --------------------------------------------------------
## wines$quality: 8
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 9.80 11.32 12.15 12.09 12.88 14.00
## wines$quality: 3
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 3.160 3.312 3.390 3.398 3.495 3.630
## --------------------------------------------------------
## wines$quality: 4
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 2.740 3.300 3.370 3.382 3.500 3.900
## --------------------------------------------------------
## wines$quality: 5
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 2.880 3.200 3.300 3.305 3.400 3.740
## --------------------------------------------------------
## wines$quality: 6
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 2.860 3.220 3.320 3.318 3.410 4.010
## --------------------------------------------------------
## wines$quality: 7
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 2.920 3.200 3.280 3.291 3.380 3.780
## --------------------------------------------------------
## wines$quality: 8
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 2.880 3.162 3.230 3.267 3.350 3.720
## wines$quality: 3
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.4400 0.6475 0.8450 0.8845 1.0100 1.5800
## --------------------------------------------------------
## wines$quality: 4
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.230 0.530 0.670 0.694 0.870 1.130
## --------------------------------------------------------
## wines$quality: 5
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.180 0.460 0.580 0.577 0.670 1.330
## --------------------------------------------------------
## wines$quality: 6
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.1600 0.3800 0.4900 0.4975 0.6000 1.0400
## --------------------------------------------------------
## wines$quality: 7
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.1200 0.3000 0.3700 0.4039 0.4850 0.9150
## --------------------------------------------------------
## wines$quality: 8
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.2600 0.3350 0.3700 0.4233 0.4725 0.8500
## Warning: Removed 162 rows containing non-finite values (stat_smooth).
## Warning: Removed 162 rows containing missing values (geom_point).
##
## Calls:
## m1: lm(formula = density ~ citric.acid, data = subset(wines, citric.acid >
## 0 & citric.acid <= quantile(wines$citric.acid, 0.999)))
##
## =============================
## (Intercept) 0.996***
## (0.000)
## citric.acid 0.004***
## (0.000)
## -----------------------------
## R-squared 0.1
## adj. R-squared 0.1
## sigma 0.0
## F 210.8
## p 0.0
## Log-likelihood 7229.2
## Deviance 0.0
## AIC -14452.3
## BIC -14436.5
## N 1465
## =============================
##
## Calls:
## m2: lm(formula = quality ~ alcohol, data = subset(wines, alcohol >
## 0 & alcohol <= quantile(wines$alcohol, 0.999)))
##
## =============================
## (Intercept) 1.818***
## (0.175)
## alcohol 0.366***
## (0.017)
## -----------------------------
## R-squared 0.2
## adj. R-squared 0.2
## sigma 0.7
## F 480.4
## p 0.0
## Log-likelihood -1715.4
## Deviance 800.7
## AIC 3436.8
## BIC 3452.9
## N 1598
## =============================
##
## Calls:
## m3: lm(formula = quality ~ volatile.acidity, data = wines)
##
## ===============================
## (Intercept) 6.566***
## (0.058)
## volatile.acidity -1.761***
## (0.104)
## -------------------------------
## R-squared 0.2
## adj. R-squared 0.2
## sigma 0.7
## F 287.4
## p 0.0
## Log-likelihood -1794.3
## Deviance 883.2
## AIC 3594.6
## BIC 3610.8
## N 1599
## ===============================
## wines$quality: 3
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.00000 0.04975 0.35250 1.56300 3.21400 5.94000
## --------------------------------------------------------
## wines$quality: 4
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.000 0.270 0.882 1.757 2.700 9.400
## --------------------------------------------------------
## wines$quality: 5
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.000 0.950 2.185 2.412 3.572 10.270
## --------------------------------------------------------
## wines$quality: 6
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.000 0.972 2.764 2.923 4.654 9.112
## --------------------------------------------------------
## wines$quality: 7
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.000 3.310 4.720 4.288 5.685 9.880
## --------------------------------------------------------
## wines$quality: 8
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.420 3.375 5.116 4.624 6.160 8.978
## wines$quality: 3
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 4.360 5.171 5.320 5.637 5.681 8.514
## --------------------------------------------------------
## wines$quality: 4
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 3.003 4.753 5.871 6.092 6.612 18.800
## --------------------------------------------------------
## wines$quality: 5
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 3.705 5.130 5.723 6.145 6.615 19.400
## --------------------------------------------------------
## wines$quality: 6
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 4.171 5.985 6.763 7.171 8.033 19.300
## --------------------------------------------------------
## wines$quality: 7
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 4.563 7.326 8.378 8.493 9.564 13.560
## --------------------------------------------------------
## wines$quality: 8
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 7.150 8.045 9.322 9.257 10.520 11.480
## wines$quality: 3
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.00000 0.00245 0.01930 0.11220 0.23840 0.37620
## --------------------------------------------------------
## wines$quality: 4
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.0000 0.0144 0.0456 0.1290 0.1408 2.0000
## --------------------------------------------------------
## wines$quality: 5
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.0000 0.0510 0.1292 0.1602 0.2310 0.9576
## --------------------------------------------------------
## wines$quality: 6
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.0000 0.0567 0.1680 0.1925 0.2965 0.9044
## --------------------------------------------------------
## wines$quality: 7
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.0000 0.2008 0.2976 0.2837 0.3893 0.7344
## --------------------------------------------------------
## wines$quality: 8
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.0246 0.2164 0.3424 0.2966 0.3675 0.5904
##
## Calls:
## m4: lm(formula = quality ~ alcohol, data = wines)
## m5: lm(formula = quality ~ alcohol + sulphates, data = wines)
## m6: lm(formula = quality ~ alcohol + sulphates + citric.acid, data = wines)
## m7: lm(formula = quality ~ alcohol + sulphates + citric.acid + fixed.acidity,
## data = wines)
## m8: lm(formula = quality ~ alcohol + sulphates + citric.acid + fixed.acidity +
## density, data = wines)
## m9: lm(formula = quality ~ alcohol + sulphates + citric.acid + fixed.acidity +
## density + volatile.acidity, data = wines)
## m10: lm(formula = quality ~ alcohol + sulphates + citric.acid + fixed.acidity +
## density + volatile.acidity + chlorides, data = wines)
## m11: lm(formula = quality ~ alcohol + sulphates + citric.acid + fixed.acidity +
## density + volatile.acidity + chlorides + residual.sugar,
## data = wines)
## m12: lm(formula = quality ~ alcohol + sulphates + citric.acid + fixed.acidity +
## density + volatile.acidity + chlorides + residual.sugar +
## total.sulfur.dioxide, data = wines)
## m13: lm(formula = quality ~ alcohol + sulphates + citric.acid + fixed.acidity +
## density + volatile.acidity + chlorides + residual.sugar +
## total.sulfur.dioxide + pH, data = wines)
##
## ============================================================================================================================================
## m4 m5 m6 m7 m8 m9 m10 m11 m12 m13
## --------------------------------------------------------------------------------------------------------------------------------------------
## (Intercept) 1.875*** 1.375*** 1.434*** 1.138*** 62.356*** 30.401* 31.514* 42.871* 47.404** 25.493
## (0.175) (0.177) (0.176) (0.214) (15.472) (15.163) (15.111) (17.775) (17.729) (21.142)
## alcohol 0.361*** 0.346*** 0.338*** 0.346*** 0.296*** 0.298*** 0.281*** 0.271*** 0.249*** 0.275***
## (0.017) (0.016) (0.016) (0.016) (0.021) (0.020) (0.020) (0.022) (0.023) (0.026)
## sulphates 0.994*** 0.814*** 0.821*** 0.881*** 0.732*** 0.885*** 0.905*** 0.955*** 0.929***
## (0.102) (0.107) (0.106) (0.107) (0.104) (0.112) (0.113) (0.114) (0.114)
## citric.acid 0.513*** 0.312* 0.278* -0.460*** -0.325* -0.344* -0.215 -0.231
## (0.093) (0.125) (0.125) (0.137) (0.141) (0.142) (0.145) (0.145)
## fixed.acidity 0.033* 0.076*** 0.077*** 0.070*** 0.078*** 0.066*** 0.031
## (0.013) (0.017) (0.017) (0.017) (0.018) (0.018) (0.026)
## density -61.296*** -28.268 -29.221 -40.621* -44.859* -21.594
## (15.490) (15.198) (15.145) (17.822) (17.772) (21.575)
## volatile.acidity -1.302*** -1.195*** -1.192*** -1.125*** -1.124***
## (0.116) (0.119) (0.119) (0.120) (0.120)
## chlorides -1.444*** -1.470*** -1.646*** -1.825***
## (0.408) (0.408) (0.409) (0.419)
## residual.sugar 0.017 0.029* 0.020
## (0.014) (0.014) (0.015)
## total.sulfur.dioxide -0.002*** -0.002***
## (0.001) (0.001)
## pH -0.361
## (0.190)
## --------------------------------------------------------------------------------------------------------------------------------------------
## R-squared 0.2 0.3 0.3 0.3 0.3 0.3 0.4 0.4 0.4 0.4
## adj. R-squared 0.2 0.3 0.3 0.3 0.3 0.3 0.3 0.3 0.4 0.4
## sigma 0.7 0.7 0.7 0.7 0.7 0.7 0.7 0.7 0.6 0.6
## F 468.3 295.0 210.5 159.8 132.2 140.0 122.6 107.5 98.2 88.9
## p 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
## Log-likelihood -1721.1 -1675.1 -1660.0 -1657.0 -1649.2 -1587.9 -1581.6 -1580.9 -1573.0 -1571.2
## Deviance 805.9 760.9 746.6 743.9 736.6 682.2 676.9 676.3 669.6 668.1
## AIC 3448.1 3358.3 3329.9 3326.1 3312.5 3191.8 3181.2 3181.8 3168.0 3166.3
## BIC 3464.2 3379.8 3356.8 3358.4 3350.1 3234.8 3229.6 3235.5 3227.1 3230.9
## N 1599 1599 1599 1599 1599 1599 1599 1599 1599 1599
## ============================================================================================================================================
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## Warning: position_stack requires non-overlapping x intervals